Efficient Ad-hoc Approximate Query Processing in Peer-to-Peer Databases

نویسندگان

  • Benjamin Arai
  • Gautam Das
  • Dimitrios Gunopulos
  • Vana Kalogeraki
چکیده

1 This paper has appeared in The 22 International Conference on Data Engineering (ICDE) Atlanta, Georgia 2006. ABSTRACT Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries – e.g., aggregation queries – on these databases poses unique challenges. Exact solutions can be time consuming and difficult to implement given the distributed and dynamic nature of peer-to-peer databases. In this paper we present novel sampling-based techniques for approximate answering of ad-hoc aggregation queries in such databases. Computing a high-quality random sample of the database efficiently in the P2P environment is complicated due to several factors – the data is distributed (usually in uneven quantities) across many peers, within each peer the data is often highly correlated, and moreover, even collecting a random sample of the peers is difficult to accomplish. To counter these problems, we have developed an adaptive two-phase sampling approach, based on random walks of the P2P graph as well as block-level sampling techniques. We present extensive experimental evaluations to demonstrate the feasibility of our proposed solution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Partial read from peer-to-peer databases

In this paper we propose Scoop, a mechanism to implement the “partial read operation” for peer-to-peer databases. A peer-to-peer database is a database that its relations are horizontally fragmented and distributed among the nodes of a peer-to-peer network. The partial read operation is a data retrieval operation required for approximate query processing in peer-to-peer databases. A partial rea...

متن کامل

An Efficient Hybrid Algorithm to Reduce Latency in Ad-Hoc Aggregation

A data warehouse is a collection of data gathered and organized so that it can easily be analyzed, extracted, synthesized and also be used for the purpose of further understanding data. Peer to Peer networks are used for distribution and sharing of documents. In traditional techniques, when aggregate functions like average, sum and count are encountered, the aggregate operation is performed by ...

متن کامل

A Gnutella-based P2P System Using Cross-Layer Design for MANET

It is expected that ubiquitous era will come soon. A ubiquitous environment has features like peer-to-peer and nomadic environments. Such features can be represented by peer-to-peer systems and mobile ad-hoc networks (MANETs). The features of P2P systems and MANETs are similar, appealing for implementing P2P systems in MANET environment. It has been shown that, however, the performance of the P...

متن کامل

Context-Aware Query Processing in Ad-Hoc Environments of Peers

In this article, we deal with context-aware query processing in ad-hoc peer-to-peer networks. Each peer in such an environment has a database over which users execute queries. This database involves (a) relations which are locally stored and (b) virtual relations, all the tuples of which are collected from peers that are present in the network at the time when a query is posed. The objective of...

متن کامل

An Enhanced Searching Algorithm over Unstructured Mobile P2P Overlay Networks

To discover objects of interest in unstructured peer-to-peer networks, the peers rely on flooding query messages which create incredible network traffic. This article evaluates the performance of an unstructured Gnutella-like protocol over mobile ad-hoc networks and proposes modifications to improve its performance. This paper offers an enhanced mechanism for an unstructured Gnutella-like netwo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006